Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect logtype encoding by properly escaping variable placeholder characters (fixes #163). #162

Merged
merged 32 commits into from
Dec 5, 2023

Conversation

LinZhihao-723
Copy link
Member

@LinZhihao-723 LinZhihao-723 commented Sep 18, 2023

Reference

#163

Description

When creating the logtype dictionary, the characters of the variable placeholders in the original log events are not properly escaped, which causes #163. This PR fixes this problem by properly escaping these characters and tracking the location of the escaped placeholders for each logtype dictionary entry. It also handles the escape in ffi related code.
Rules we added in this PR for logtype encoding:

  1. Variable placeholder characters are escaped when encoding a logtype for compression.
  2. Variable placeholder characters are double escaped when building a search query. This is required for the wildcard
  3. When building a query, if the input contains an escaped wildcard character, we should not escape the escape character.

Validation performed

@LinZhihao-723 LinZhihao-723 marked this pull request as ready for review September 18, 2023 21:33
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to update the logtype query generation in ffi::search. That instance is a bit more complicated: If the logtype query contains no wildcards, we don't want to escape the escape characters (double-escape) since the logtype query won't be used in wildcard matches. I guess the most straightforward way to do this would be to do it while searching the logtype for wildcards in the Subquery constructor.

components/core/src/string_utils.cpp Outdated Show resolved Hide resolved
components/core/src/ir/parsing.hpp Outdated Show resolved Hide resolved
components/core/src/EncodedVariableInterpreter.cpp Outdated Show resolved Hide resolved
components/core/src/ir/parsing.hpp Outdated Show resolved Hide resolved
components/core/src/LogTypeDictionaryEntry.hpp Outdated Show resolved Hide resolved
components/core/src/LogTypeDictionaryEntry.hpp Outdated Show resolved Hide resolved
components/core/src/LogTypeDictionaryEntry.hpp Outdated Show resolved Hide resolved
components/core/src/LogTypeDictionaryEntry.hpp Outdated Show resolved Hide resolved
components/core/src/LogTypeDictionaryEntry.hpp Outdated Show resolved Hide resolved
@jackluo923 jackluo923 self-requested a review December 5, 2023 04:02
@LinZhihao-723 LinZhihao-723 merged commit 3dcac07 into y-scope:main Dec 5, 2023
5 checks passed
@kirkrodrigues kirkrodrigues changed the title Fix Dictionary Collision by tracking the escaped variable placeholders Fix incorrect logtype encoding by properly escaping variable placeholder characters (fixes #163). May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants